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Abstract — The Least Absolute Shrinkage and Selection Op- 
erator (LASSO) has gained attention in a wide class of low 
dimensional hard estimation problems with promising results. 
However, such attempts lack strong theoretical background. The 
objective of this work is to provide a framework for such an 
analysis. We follow the logical idea of introducing a "best" 
LASSO estimator covering all possible grid points to avoid 
the grid choice dependence. We derive conditions for which 
the so called homotopy method of implementing the proposed 
Continuous LASSO (CLASS) is consistent in the high SNR 
scenario assuming infinite estimation dynamic range. We also 
show that the analysis of CLASS applies to the conventional 
grid-based LASSO for a sufficiently fine grid. Finally, we give 
the estimation Mean Squared Error (MSE) for such consistent 
cases. We present the problem and the numerical results in the 
context of Direction of Arrival (DOA) estimation. 

Index Terms — IEEEtran, journal, BTgX, paper, template. 



I. Introduction 

FINDING a precise sparse linear representation of a signal 
through the t\ shrinkage and selection operator has been 
an active field of study for more than a decade with many 
applications 0, 0, 0. The LASSO technique 0, also 
known as basis pursuit J5), and the global matched filter 
are among the pioneering such studies. These methods 
received more attention by different modifications [7|, [8], 0, 
iflOl and the invention of effective implementation techniques 
ifTTl . fl2l . Ifl3ll . fl4l . There has also been various attempts 
to formulate and analyze the performance of these methods 
in some asymptotic cases lfl5ll . fl6l . IfTTl . The general idea 
behind such methods is to construct efficient regression vectors 
to get a better representation which leads to consistency 
conditions based on less coherent regressors. More recently, 
there have been different attempts to use LASSO with highly 
correlated regressors dictated by physical models. These meth- 
ods have found a wide application in hard low dimensional 
estimation problems, such as DOA and frequency estimation 
fl8l . because of their simple implementation through convex 
programming techniques |fl9l . However, these attempts lack 
a strong theoretical background. In this work, we propose a 
framework for performing such an analysis. 

Framework 

We present our work in the context of DOA estimation in 
the case of far-field sources and narrowband signals, which 
is a well studied problem f20ll . In iTTSl and ETI the problem 
is reformulated in a sparse framework and solved using the 
LASSO technique. However, the theoretical questions about 
the consistency and the error level of this method is still 



unanswered. Note also that the analysis can be generalized 
by replacing the specific terms such as DOAs, waveforms, etc 
by more general terms such as regressor indexes, regression 
vector, etc, respectively. For the current purpose, the corre- 
sponding terms are equivalently interchangeable. 

The main difference of the current study from the previous 
ones is illustrated by noting the following features: 

a) In the current analysis, a distance function between the 
regressor indexes, a.k.a atom indexes, is defined and we accept 
a close estimate of the sparsity pattern even if it is not the 
true one. Consistency means that the distance between the 
estimated pattern and the true one approaches zero in some 
asymptotic case, which is the high SNR scenario in this work. 

b) Our investigation on the LASSO-based DOA estimator 
concerns the case in which the Restricted Isometry Property 
(RIP) [22 1 does not hold true if the grid is dense enough. 
However, as we will see, this only results in a fundamental 
resolution limit on the method. Still, incoherent atoms may be 
retrieved perfectly even in such a dense grid. 

Methodology 

Definition: The first obstacle in our analysis is related 
to the definition of the LASSO estimator. Practically, this 
method inherits a discretization step which fits the model to 
that of the l\ based regression f23l . However, to remove the 
effect of the grid, one may assume a case of an infinitely 
small grid size, which transforms the optimization to a 
functional optimization. In this work we introduce a different 
framework of estimating a finite set to avoid complications of 
the functional analysis. However, we show that the proposed 
continuous LASSO (CLASS) is equivalent to the natural 
integral optimization extension. We also include some basic 
necessary properties of the proposed CLASS technique. 

Consistency: This work contains the consistency condi- 
tions of CLASS in the high SNR case, which corresponds 
to the so-called noiseless implementation of LASSO. This 
is generally assumed as a hard method to analyze and to 
implement. However as proposed in ifTTIl . the close connection 
between the noisy LASSO for small values of Regularization 
Parameter (RP) and the noiseless LASSO may be exploited 
to review the properties of the noiseless LASSO. This is 
generally known as the homotopy technique, which we propose 
as follows: 

a) The CLASS cost function is a uniform continuum (ho- 
motopy) with respect to the RP between the trivial absolute 
cost at the infinite RP value and noiseless CLASS at zero. 
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b) For each value of the RP, the optimal point is unique. 
Hence, there exists a unique solution component starting from 
the zero vector and stopping at the CLASS noiseless solution. 

c) Under some conditions on DOAs, the solution component 
is consistent in the sense that for sufficiently small level of 
noise, there exists a solution point with the correct number of 
estimates arbitrarily close to the true parameters. It turns out 
that this point converges to the noiseless solution as the noise 
vanishes, and the analysis could be done by linearizing in the 
neighborhood of the noiseless solution. As we see later, there 
are irregular cases, for which there always exists an additional 
faulty DOA estimate, but its corresponding waveform value 
vanishes as the noise tends to zero. In practice, the case might 
be interpreted as a consistent one, since as noise decreases, 
eventually the estimated waveform value reaches outside the 
dynamic range. However, in this work, we stick to the infinite 
dynamic range assumption neglecting the aforementioned 
cases. 

Performance Analysis: The ultimate goal of this paper is 
to give the first order performance of CLASS in presence of 
noise under the consistency conditions. This work will show 
the best achievable performance of CLASS in terms of bias 
and Mean Squared Error (MSE), by choosing a proper point in 
the solution component, given the correct number of sources. 
The simple approach is to find the irregular or the singular 
point in the path with the given model order (|24|). 

Paper Constitution 

In the remaining, we introduce CLASS and its optimal- 
ity conditions in Sections [ill] and IIII-BI respectively. The 
uniqueness and continuity issues will be discussed in Section 
IIV-AI As we see later in Section IIV-D1 the best behavior 
of the CLASS in the high SNR regime is obtained by the 
near-zero analysis of the regularization parameter, generally 
known as the homotopy rule ifTTI . Il24ll . which we discuss in 
Section IIV-BI To provide a computational condition, we use 
the CLASS optimality conditions 1 14] and obtain a consistency 
rule through the homotopy in Section IIV-CI We conclude this 
paper by giving the asymptotic performance in Section IIV-DI 
and showing some results of applying CLASS to the DOA 
estimation problem in Section [V] 

II. Problem Statement 

We start by a brief review of LASSO-based estimation 
of DOAs receiving narrowband signals transmitted by far 
sources. 

A. DOA Estimation Problem 

Considering a set of n sources at directions 6 = 
{8 1, 62, ■ ■ ■ ,0 n }, the received sequence of T sampled signal 
vectors X = [x(l) x(2) . . . x(T)], at an array of m sensors 
can be written as 



X = A(6»)S + N. 



(1) 



where, S = [s(l) s(2) . . . s{T)] and N = [n(l) n(2) . . . n(T)] 
are the sequences of transmitted data and received noise 



respectively, and A(9) = [a(#i) a(^) • • • a(0 n )] is the col- 
lection of the steering vectors corresponding to the DOAs 6. 
Each steering vector is given by (|6|) 

J 2 f r i cos(e-pi) p j 2 fr 2 cos(9-p 2 ) 



8,(9) = 



-Pm) 



(2) 



where (ri,pi) is the polar coordinate pair of the i th sensor 
(i = 1,2, ... ,m) and d is the wavelength at the central fre- 
quency. Then, the goal is to estimate 9 given X. Obviously, the 
problem is defined in a complex-valued space of variables. For 
simplicity, we focus on the uniform half-wavelength (r, = y), 
linear (pi = 0) array. In this case, it is more convenient to write 
(O in terms of the electrical angle (f> = ir cos 9. We may also 
use the notation a(</>) to show the steering vector as a function 
of the electrical angle, whenever there is no risk of confusion. 
We remind that our results are applicable to general estimation 
problems that can be modeled in the form (HJ. 

B. Conventional LASSO-based Solution 

The estimation problem can be solved by first discretizing 
the parameter space and reformulating the model in the 
sparse framework ||T8l . Note that discretization will naturally 
introduce an additional quantization noise to the estimated pa- 
rameters. Consider a big set of grid points 9q from which we 
choose the estimated parameters. Let us denote A 9 = K{Qq). 
Then, the LASSO based estimator is given by 
. 1 



S 9 (A) 



argmm— 
ss 2 



IX- A 9 S 9 | 



All I 



(3) 



where, for every S with Sy as the element in the i th row and 
j th column, 

||S||i,2 = $><,s, (4) 



with 



7i,S 



(5) 



and 7s = [71. s 7i.s ■ • ■ 7n.s]- We also refer to the diagonal 
matrix of the elements in 7s as Ts- We further introduce 
9 (A) C Bq as the set of all DOAs corresponding to the 
nonzero rows in S(A). We will refer to ([3]) as 3? g optimization. 
As explained in O, the estimation can be performed in the 
noiseless case by solving 



SJi = argmin||S 9 ||i, 2 subject to A 9 S 9 = X, 

S3 



(6) 



with the active bases corresponding to directions in 9 nV We 
will refer to this optimization as ^ s , n i- 

III. Continuous LASSO 

A natural method of extending the above technique to 
estimate continuous DOA parameters is to let the grid size 
increase so that A 9 covers all possible directions. In this case, 
one may assume that the summations tend to proper integrals. 
However, due to technical difficulties we avoid such a method, 
and instead introduce the following optimization which leads 
to an extension of In the end of this section, we show the 
equivalence of this method to the former integral extension. 
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A. CLASS construction 

The CLASS construction is based on the observation that 
the LASSO solution is always sparse. Denoting 

(7) 



where the minimum is taken over all T-dimensional complex 



P(X,6,X) = min*(X,0,S,A), 



where 



*(X,0 ! S,A) = i||X-A(0)S||^ + A||S|| 1 , 2) (8) 

it is simple to check that P(X, 0, A) exists for each A > and 
it is a continuous function of (9,X) G [0 2ir] n x M for each 
n E N. Moreover, (Q is an extension of ©. In fact, (O can 
easily be obtained from by substituting 6 = 9q. We call 
a DOA set 6 reducible at A if there exists a global minimum 
point S of (|7), at least one row of which is completely zero. 
In this case, 9 can be reduced to a 9' C by removing 
the DOAs corresponding to the zero rows. Note that if 9 can 
be reduced to 9' at A, then P(X,0,A) = P(X,0',A). The 
final observation is that if the dimension n of 9 is greater 
than 2m it is reducible, due to the convex conic nature of the 
optimization. Next, we define 

0(X, A) = argmiriP(X, 9, A) 
see 

subject to 9 is irreducible at A, (9) 

oo 

where 9 = (J [0 7r] fc is the algebra of all finite subsets of 

k=l 

the interval [0 tt]. Note that due to the constraint, the solution 
dimension is bounded by 2m and 9 always exists, since the 
optimization can be confined to a finite union of compact sets. 
We refer to <j9j as & t optimization. We can also define the 
noiseless continuous optimization ^ n i by 

{0ni,S n i} = argmin||S||i subject to A(0)S = X. (10) 

see, s 

The goal of this paper is to discuss the performance of this 
continuous version of LASSO in the asymptotic low noise 
scenario of the DOA data model. This will be mainly discussed 
in Section [IV] The consistency of the £P g optimization will 
also be presented. We finish this section by stating some 
basic observations and introducing some new definitions in the 
noiseless scenario which will be necessary for the analysis. 

B. Preliminary Observations 

This section contains two main results. First, we rationalize 
the idea of CLASS by comparing it with another natural 
extension. Next, we give a set of optimality conditions for 
&t- We start by discussing a straightforward extension of 
(01 by letting the grid size increase so that the sums tend to 
integrals. On the limit, the vector S 9 tends to a vector measure 
s : SS\a n] — > C T , where 3B\a j is the restriction of the 
real Borel a-algebra to [0 tt]. Then, we can show the following. 

Theorem 1. For a certain realization of X, assume that the 
S?t solution is given by 0(X, A). Then, 

2 



P(X,0(X,A),A)=rmni 



X 



[0 7r] 



a{9)ds H 



+ A||s||, 
(11) 



vector measures on 88]n n i and ||s|| denotes the total variation 
of s over [0 tt]. 

Proof: The proof is given in l25ll . 

■ 

Now, we give a general characterization of the global 
minimum of & t . Indeed, this plays a central role in the later 
analysis which includes some steps as follows. 

Theorem 2. Suppose S is an n x T matrix with Sj as the i th 
row and = [6 X 2 ... n \. By introducing N = X- A(0)S, 
the matrix S is a global minimum in (0 if and only if first, 

a H (f?,)N = X^S h (12) 

for every 9i that corresponds to an active row in S, and second, 



.Hi 



)N|| 2 < A, 



(13) 



for i = 1, 2, . . . , n. 

Proof: See iTRl for a proof. 

■ 

For the sake of simplicity, we denote the row in S corre- 
sponding to the DOA 9 in 9 by Sg. From Theorem[2]two basic 
results can be inferred. We explain them in the following two 
corollaries. 

Corollary 1. CLASS optimality condition. The matrix S and 
the DOA vector 9 in Theorem [2] form a global minimum for 
(0 if and only if S is irreducible and the following conditions 
are satisfied: 



A H (9)N = ATg 1 S 

max ||a H (0)N|| 2 < A, 

0<6»<7r 



(14) 



Proof: First, assume that S is an irreducible global min- 
imum for the S? t . Then, for each DOA 9 with < 9 < tt, 
the vector Si = [S H 0] H is a global minimum of (0 with 
9\ — [9 9o}. Using Theorem [2] we conclude ( fl4l ). Now, 
suppose that S and 9 satisfy (fl4l) . For each different DOA set 
0i, introducing 2 = 0i U 9 we observe that S 2 = [S H 0} H 
satisfies the conditions in Theorem |2 Thus, 2 is reducible to 

and P(X, 2 , A) = P(X, 0, A). On the other hand because 

01 C 2 we have P(X,0 2 ,A) < P(X,0i,A). Putting the 
results together, we complete the proof. ■ 

Corollary 2. Second order sufficiency. The 3P t solution and 
the solution to (0 are only functions of the observed sample 
correlation matrix JH X = yXX ff . 

Proof: We only need to show that the solution to S in ( fT2l 
is a function of R x . This can easily be shown by substituting 
the definition of N in Theorem [2] into ( fT2l to obtain 



(A H A 



AT S ) 'X, 



(15) 



which solves S with respect to Ts- According to the fact that 
Ts consists of the diagonal elements of SS H , and substituting 
( fT5] l, it is obvious that Ts and, consequently, S are functions 
of R x only. ■ 
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As a result of Corollary [2] we may assume that X is of full 
column rank. Otherwise, there exists a full rank matrix X' 
with fewer columns and the same sample correlation matrix. 
However, to maintain the statistical properties of X, we avoid 
such a restriction in this stage. 

A similar set of results can be obtained for the noiseless 
optimization 3^t,nU which is the modified version of the 
results, such as the null space property originally given in 
[26 1, on which the RIP theory of perfect recovery is based. 
For a complete analysis see [25 1. 

IV. Performance Analysis 

A. General properties of the solution path 

We start this section by a nice result showing that the 
CLASS optimization over the ULA manifold is unique. 
Although unproved, we also claim that this theorem is valid 
for all unambiguous array manifolds based on pure empirical 
observations. 



Theorem 3. The CLASS solution uniqueness. The solutions 
to £?t,n\ and 3? t , for every A > 0, are unique and, at most, of 
dimension n = m — 1. 

Proof: The proof is given in Appendix [A] 



B. The near-zero behavior of the LASSO path 

The homotopy rule also enables us to perform the analysis 
for very small values of A and small deviations AX from the 
noiseless measurement X, in which case we may utilize the 
first order expansion tools. As previously explained, we use 
the notation a(^) to show the steering vector as a function 
of the electrical angle in the ULA case. We confine the 
analysis for the cases in which, fixing X, the dimension of 
the solution is equal to the dimension of the noiseless solution 
in a sufficiently small neighborhood of A = 0. We may call 
such case a pure case. We also denote d((9) = da ^' and 
D(0) = [d(0i) d(0 2 ) . . .d(6>„)]. As we have already shown 
by continuity, for sufficiently small values of A and AX, the 
solution is close to the noiseless one. Let us say that the 
solution pair at such a point is given by (9 + AO, S + AS), 
where (9,S) is the solution of the &t,n\- We recall that we 
neglect the possibility of the existence of additional vanishing 
DOAs. Expanding the conditions in Theorem |2j up to the 
second order, we get 



\a. H (9) (AX - A(0)AS - D(0)AQ'S) || 2 < A 



.Hi 



i) (AX - A(0)AS - D(0)AQU) = AU. t 



(16) 



(17) 



As a remark, note that the global sparsity upper bound of 
n = m — 1 is not obvious in the complex case. The reader 
may verify that the resulting sparse vector from complex 
LASSO by a random regression matrix can be nonzero in 
possibly 2m — 1 elements. This tighter bound reflects the 
unique characteristics of the array manifold. The immediate 
consequence of Theorem [3] is that the unique solutions of 2? t 
and 3^t,vA are continuous with respect to all their variables (X 
and A). For the current purpose, we focus on the continuum 
at A = 0. However, due to multidimensionality, the closeness 
definition should be clarified on 0. A general expectation is 
that two close solution pairs (0j,Si) with i = 1,2 get close 
in a sense that their elements corresponding to "far" DOAs 
gradually vanish. The S? t continuum may also extend to 
A = 0, in which case it approaches the &t,n\ solution. This 
is generally known as the homotopy principle in the literature 
[24|. The following theorem summarizes and clarifies these 
ideas. 



Theorem 4. Solutions to 3?t and &t,n\ are continuous, i.e. 
for every A > 0, X, and e > 0, there exists 8 > so that if 
max{||AX||, ||AA||} < S, then for all £ 0(X+AX,A+AA) 
either there exists a 9' g 0(X, A) so that max{|0 - 9'\, \\Sg - 
S e /||} < e or ||S '|| <e. 

Furthermore, as A tends to zero, the solution to tends 
to a solution of &>t,nl m the above sense. 

Proof: The proof follows from a standard procedure. 
However, due to the different continuum definition we give 
a sketch of it in Appendix [B] . 

■ 

As we have already stated, the continuity of the LASSO path 
at A = is widely referred to as the homotopy property. 



where we introduced U = Tg 1 S, with U$ as the i th row, 
and AQ = AQT with AQ' as the diagonal matrix whose 
diagonal elements are identical to A0. Note that ||Uj||2 = 1. 

We observe from ( fTSI l that Q{ is a max- 
imum point for the function /(#o) = 
\\a H {9 ) (AX - A(0)AS - D(0)AQU) || 2 . Thus, its 
first derivative is equal to zero and we get 



Re (d H (9,) (AX - A(0)AS - D(0)AQU) Uf ) 



0. 



(18) 

We can solve ( fT8l and ( TTTb to obtain the first order expan- 
sions. Let us define = I— A(A ff A) -1 A H as the orthogo- 
nal projection matrix of the operator A, the matrix S = XJJJ H 
with £j as the i th column, and R = Re [D^D p H T ] , where 
Dp = P^D and denotes the element wise multiplication. 
Then, we have A0 = S + A/3 where 



= T^R-iRe 



S = T^R-iRe 



df A{A H A)- 1 ^ 1 



d^A(A«A)- 1 e i 



dfP^AXUf 



d^PiAXU^ 



(19) 



(20) 



where we dropped the arguments of A(0) and 
D(0) for simplicity. Further, we get AS = 
(A^A)- 1 [A H (AX - DAQU) - AU] . We recognize 
d20b as the first order expansion of the standard NLLS 
estimation error (27), whereas dl9l) represents the contribution 
due to the l\ regularization. 
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C. Consistency Analysis 

In this section, we discuss the correctness of the solution in 
the asymptotic noiseless case. It turns out that the LASSO 
based estimation is not generally consistent. Our work is 
based on the definitions given in l28l . where an asymptotic 
consistency resolution lower bound of A<f> = — is computed 
for the case of ULAs by trying the null space property on 
a set of candidate null vectors and the discretized LASSO 
based estimates. Likewise, we focus on the case n = 2 and 
look for a condition on the separation of the two true DOAs, 
under which the LASSO based estimator gives correct result 
irrespective of the true source vector S. As a comparison, 
note that there is no such consistent case for the classical 
beamforming technique 1201 . which cannot resolve sources 
with too different power levels. Also, note that to investigate 
the consistency one should consider the case of infinitesimal 
noise rather than the noiseless one. As usual, there is a clear 
relation between the two cases given by the homotopy rule. 
Thus, we start by the noiseless case (AX = 0) and look for 
the so called perfect recovery bounds. Finally, note that in the 
case of there exists a simple symmetry in linear arrays. It is 
simple to see that fixing the true S, a rigid shift in true DOA 
electrical angles results in a rigid shift in the CLASS solution. 
Thus, we may take two DOAs 9 = ■§ , § + A9 corresponding 
to <j) — Oj respectively without loss of generality if the 
array is uiform and linear. 

We use the condition ( fl6] l with the results in Section IIV-BI 
to obtain an upper bound. If the noiseless solution is identical 
to the true solution, (fT~6T > should be correct with the true 
parameters and the ones computed in ( fl9l l. One may show 
that ([Tol l is also a sufficient condition for the pure cases. Note 
that there may exist consistent impure cases, which obviously 
do not satisfy ( fl6l l. Thus, what we obtain is an upper bound for 
a more general framework, in which there is a finite dynamic 
range. Since AX = 0, we have 8 = 0. Then, AO = A/3 and 
AS = -A(A^A)- 1 [A^DAQo + 1] U, where AQ a is the 
diagonal matrix of the elements of Ts(3- We can further write 
(fT~6b as follows 



V0, \\ a H (9)(PiT>AQ -A(A H Ay 1 )S^ 



< 1 



O(A), 
(21) 

Letting A tend to zero, we observe that d2Tb is also a zero order 
necessary condition removing the term O(A). This means that 

Result 1. For any noiseles solution of CLASS, we have 
V0, ||a H (0) (PiDAQo - AtA^AJ-^S*!^ 1. (22) 



For a certain m, at a DOA set 0, if the condition (1221 is 
true for every 9 and a certain choice of H, this DOA set is 
said to be consistent over E. Note that AQ is also a function 
of E. We can also simplify the analysis by focusing on the 
asymptotic case of very large number of sensors m, A more 
general terminology is given as follows. 

Definition 1. For a fixed m, a DOA set is said to be 
absolutely consistent if ( l22l is true for every choice of E. 
A DOA set is called almost consistent if d22l is true for 
almost all matrices E. Furthermore, a sequence 9 rn is called 
asymptotically consistent if for every E there exists a large 



number M such that for m > M, the set 9 m is consistent over 
E. Finally, a sequence is almost asymptotically consistent if 
the later condition is true for almost all matrices E. 

Asymptotic Consistency for ULA 

Using these definitions, we provide some asymptotic con- 
sistency conditions when the array is uniform and linear. 
We first need to introduce the functions F a (8) = —jf^-, 
G a (5) = ^f, and H a (S) = We also define the 



2x2 matrices F a (S) 



1 



Fa(S) 



Fa(S) 
1 



, n a (d) 



\3 G a {8) 
_ G a (-5) |j 
the 1x2 vectors g 
F(8) F(S + S')}. 
Neglecting the details, when m 



G a (S) 
H a {5) ' 



H a (-6) | 
(S',S) = [G(S)"G(6 + 8')] and f(<5 r ,<5) 



and 



> oo, (l22l leads to 

Result 2. assuming a sequence (p m = [0 A</> m ], 

• (pm is almost asymptotically consistent if mA</> TO — > oo. 

• It is not almost asymptotically consistent if A(p rn = 
o(i). 

• If A(p m ~ — with a fixed 8 the sequence is almost 
asymptotically consistent if for every 8' G M 

\\( Sa (8',8)-f a (8 , ,8)F- 1 G a )AQ a -f a (S',8)F-% < 1 

(23) 

Result |2] shows that in the above asymptotic case, the 
inconsistency is reflected by the resolution limit of the form 
where 5 is the smallest number satisfying d23l . In Section 
[VI we use d23l and (1221 to find out a threshold rate in the 
asymptotic case and the resolution of small arrays respectively. 

Discretizing Consistency 

Finally, in this section we review the behavior of the 
discretized LASSO based estimator for a very fine grid 
of DOAs. We define A (01,0a) = max min \6\ - 9 2 \ as the 

asymmetric distance between two DOA sets 9\ and 02- For 

each grid 0, we also define C(9) — max min \ 6 — 6'\ as the 

ee[o tt] e>ee 

measure of fineness. 

Theorem 5. Discretizing consistency. Suppose that the &t 
minimum point is given by 0q. Then, for each e > there 
exists a 8 > so that if A(0o, 0) < 8 then can be reduced 
to a DOA set 0' so that A(0', 0) < e. 

Proof: The proof is straightforward noting the uniqueness 
theorem. Assume that the theorem is not true. Then, there must 
exist an e and a sequence of DOAs 9 n reducible to 9' n such 
that A(0 O ,0„) < - but A(9' n ,9) > e. It is then simple to 
see that this leads to a & t minimum point different than 9q 
which is in contradiction with uniqueness. 

■ 

Theorem [5] shows that if the grid 9q in 3P g satisfies 
C(0g) < 5 w i m a proper 8 then the active basis of the £? g 
solution are e close to the true solution of £P t . Thus, the 
analysis of CLASS applies to the grid-based LASSO for a 
sufficiently fine grid. 
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D. Asymptotic Estimator Variance 

In this section, we complete the LASSO based DOA 
estimation performance analysis by giving the asymptotic 
performance of the CLASS estimator in the low noise case. In 
this case, the noiseless solution will not give the right number 
of DOAs and the optimization with A > should be 
utilized to compensate. Assume that the true DOA set 9 is pure 
and consistent. The noiseless solution parameters are equal to 
the true ones. Thus, we may use the true parameters using 
( fl9] >. In this case, 7i s is the sample Root Mean Square (RMS) 
of the i th source and S is the sample correlation coefficient 
matrix of the sources. The parameter A can be computed as 
the smallest value for which the computed perturbations satisfy 
( [T6l >. Recall that the consistency guarantees that for sufficiently 
small AX = N such a A value exist. We assume a white, 
Gaussian, centered, circularly symmetric noise vector process 
rij ~ JV(0,C) and N = [ni na.-.ivr]. According to the 
results in Section IIV-BI we have AO = 8 + A/3 where S and 
j3 are given in d20l > and ( fl9] l respectively. Recall that S is 
actually the asymptotic Maximum Likelihood (ML) error in 
high SNR scenario |27l . 

Result 3. The asymptotic statistical DOA bias is then given 
by 

£{A0) = £(X)(3, (24) 

where we used the fact that £ (S) = 0. The asymptotic error 
covariance is 

Cov(A6>) = 

ir-iR^Rep^CDp HjR-ir- 1 + Var(A)/3/3 T ,(25) 
where more details are presented in Appendix ICl 

As seen, the error statistics are affected by the choice of 
A. We remind that the strategy we follow here is to choose A 
such that CLASS gives the correct number of sources, which 
is possible only for pure cases in high SNR. Due to d25l >. in 
terms of minimum error, the best choice is the smallest such 
A. Thus, using ( fl6] l, we get 

A = min A' 
such that 

V0, A' > ||a H (6»){Pi(N - DAQU) + A'A(A ff A) _1 U}| 

(26) 

Let us denote the set of all non-negative values of A sat- 
isfying the inequality in d26i > for a certain 9 by Sg. Then 

A = minfl'S'e- Note that for each e Q , S e = [0 oo) 

6 

while for other values Sg = [A(0) oo), where A(0) can be 
found by solving the third line of (l26b with equality. Now, 
obviously, A = maxA(0) (see a similar argument in [14| for 
more illustration). For large enough m, the relation d26l i can 
be simplified. We assume that A(0) for different values of 
6 is almost at the same level so that 6\ = argm,axA(0) is 
well-distributed such that Pr(min \0 X - 6\ = O(^)) = 0, 

9E9 

where is the true DOA electrical angle set. This means 
that a ff (6»i)A = o(m), a Ii (0 1 )T> = o(m 2 ) and, a ff (6»i)P^ 



is identical to & H (8i) almost surely. Following this line of 
reasoning, after some manipulations and keeping the dominant 
terms, we get 

A sa max lla^ (0)N|| 2 . (27) 

9 

Even with such a simplification, finding the two first statistical 
moments of A is a hard task. However, for the case of 
uncorrelated noise, C = cr 2 I, and large enough m in a ULA, 
we can proceed further by focusing on the extreme value, A 2 
of the sampled set {z k = \\sl h (^f)N|||}^ . In this case, 

A 2 

-5*- = lnm+ (T- l)lnlnm + 7-ln(T - 1)!, (28) 
cHto 

where 7 converges weakly to a Gumble 11291 random variable. 
It should be also reminded that this result is valid when 
T = o(e[|^). See Appendix [C] for more details. Now, it is 
expected that the true regularization A has the same expansion 
with 7 converging to a different variable. The result in d28b 
leads us to 

Result 4. The unknown expectations in Result [3] can be 
computed by 

£(\ 2 ) = a 2 m(\nm+ (T - 1) In mm + £(7) - ln(T - 1)!), 

(29) 

and 

£ (A) = ffVm In to < 1 H — > , 

I 2 In to I 

(30) 

where 7 is a proper random variable and its expectation is a 
proper positive number. 

The observations in Section[V]will show the accuracy of this 
approximation, using both a theoretical and empirical value of 

*(7). 

V. Numerical Results 

In this section, we present some results supporting and 
illustrating the analysis in Sections IIV-DI and IIV-CI for the 
case of 2 sources. 

A. Consistency 

In Section IIV-CI we analyzed the consistency of CLASS 
and identified the so called pure cases for which CLASS be- 
haves consistently under the assumption of infinite estimation 
dynamic range. Equation (f22j) gives a condition for the DOA 
set to be such a case. We also showed that, asymptotically 
there exists a threshold function A8 = — that defines such a 

m 

consistency resolution. The exact value of S can be found by 
d23l . A simple numerical search shows that S = 2.267T is the 
threshold. We remind that in practice, with a finite dynamic 
range and a discretized space, this upper bound can be reduced. 

B. Asymptotic Performance 

In Section HV-DI we computed the performance of CLASS 
in pure cases in many steps. Equations d25T > and d24l) connect 
the error covariance and the statistical bias to the two first 
moments of the optimal regularization parameter A given in 
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Number of Sensors 



Fig. 1, The experimental value of €(\ 2 ) for different number of sensors 
compared to the theoretical results. The simple dashed line shows evaluation 
of )30t with £(7) = 0.58 while the dashed one with triangle shows the same 
evaluation with £(7) = 1.3. 

(1261 , The A is then approximated to d27l i for the asymptotic 
case of large m. Finally, (|27T i was further simplified to d28l l. 
assuming uncorrected noise and almost constant number of 
snapshots. Here, we compare these results to the experimental 
ones. To find out the best estimation, we need the best A value. 
Note that while theoretically sufficient, d26l l is not applicable, 
since it is actually a first nonzero term of the exact best 
such value expanded for the high SNR case. Accordingly, 
we introduce the following procedure which works for high 
enough SNR. 

Implementation 

The optimization method is based on the homotopy rule 
and following the candidate points [14] starting from a large 
value of A and no regressors (n = 0). For each n > 0, the 
corresponding candidate point can be found by the following 
iterative method recursively from the candidate point of n — 1. 
First, fix A and solve only over the desired dimension 
of DOA n. This can be done precisely by a Newton method 
starting from the candidate point of n — 1. Second, the solution 
found in Step 1 is the best solution if it is also the optimal 
solution for general 3?t over all dimensions, but not with 
an infinitesimal decreasing of A. This means that the second 
condition in ( TBl i should hold with an extra global minimum. 
Thus, find the greatest peak p of ||a H (#)N|| other than the 
estimated DOAs and update A to ^±£. Iterate the first and 
second steps until convergence to where the next candidate 
point happens. We may continue this candidate point selection 
until the correct number of n has been reached. 

As explained in Section llV-DI the first guess for £(7) in (130b 
and (|29l l is the expected value of a normalized Gumble random 
variable, the Euler-Mascheroni constant l30l . In Figure Q] the 
simple dashed line with shows the theoretical value in (|29l l 
assuming the Gumble distribution. A better approximation can 
be found by fitting the experimental estimates, which gives 
£ (7) = 1.3. Note that this is a fundamental constant for every 
DOA estimate by a half-wavelength ULA. In Figure Q] then, 
the dashed line with triangles shows such a result. 

Fixing £(7) = 1.3, and combining (f30b and (|29| > with (l24l 
and (l25T l respectively, we complete evaluating the theoretical 
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Fig. 2. The DOA MSE versus different number of sensors. The estimation 
is based on one snapshot measurement of two sources separated by AS = 
— , and waveform values si = S2 = 1, with the noise standard deviation 
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cr = 0.001. 
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Fig. 3. The statistical bias normalized by true noise standard deviation for 
different number of sensors. The dashed line is the theoretical result. The 
estimation is based on one snapshot measurement of two sources separated 

by A9 = — , and waveform values si = so = 1. 

J m 



bias and covariance. Figures [2] and [3] show the performance 
of the estimator for different number of sensors. Note that 
the source separation A9 changes by the number of sources 
as A9 = — . We obtain the experimental results by using 
fixed values of the sources si — S2 = 1 and 1000 different 
realizations of the noise vector. The noise variance is fixed to 
a 2 = 10~ 6 . However, since the performance is proportional to 
the noise level, the normalized results are shown. The results 
show agreement between the theory and the experimental 
results. 

We then compared the LASSO performance to the ML and 
conventional beamforming ones. Figures [4] and [5] compare 
the estimation Mean Squared Errors and variances of three 
different estimators; CLASS, ML and conventional beamform- 
ing, respectively. The results show that while the asymptotic 
variances of CLASS and ML methods coincide, the CLASS 
estimator has a higher asymptotic MSE. We conclude that 
CLASS modifies the solution of ML mostly by adding a 
bias term in the very high SNR regime. However, as SNR 
decreases, the MSE of CLASS reaches the one for the ML 
estimator in the SNR regime between -2 and 5 dBs. The two 
methods reach the threshold region at almost the same SNR. 
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Fig. 4. The statistical MSE for different methods versus input SNRs. The 
estimation is based on one snapshot measurement of two sources separated 

by AS = — , and waveform values si = S2 = 1. 
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Fig. 5. The statistical variance for different methods in different input 
SNRs. The estimation is based on one snapshot measurement of two sources 
separated by Ad = — , and waveform values si = S2 = 1. 



VI. Conclusion 

In this work, we studied the behavior of LASSO-based 
signal parameter estimation in terms of bias and variance. We 
remind that the aim of this work is to analyze a previously 
introduced method and not to devise a new technique. How- 
ever, the analysis needs extra elaboration due to the improper 
definition of LASSO for statistical analysis. Accordingly, the 
concept of LASSO-based estimation was first clarified by 
introducing CLASS, a generalization of LASSO, which acts 
over a continuous set of indexes O rather than the finite matrix 
indexes. Then, we introduced conditions and definitions guar- 
anteeing a good performance. Finally, we gave the theoretical 
performance under such consistency conditions. We finally 
tested our results by introducing an implementation of CLASS 
in a high SNR scenario. The results were confined to the worst 
case of the one snapshot scenario. 

From the theoretical and experimental results, one may 
conclude that the LASSO technique sacrifices performance 
only in terms of statistical bias with the same threshold SNR 
to attain a better implementation as we explained in Section 
[Vl Note that our implementation of ML by searching for the 
best initial point is of exponential complexity if the number of 
sources increase, while the computational cost of our LASSO 
implementation grows linearly with n 3 due to the Newton 



method at each stage. This is of great interest due to the 
flexibility of LASSO with the choice of the manifold, which 
may vary due to the specific problem. Note also that the 
complex-valued nature of the problem limits the implemen- 
tation techniques, in which case the current technique should 
be considered as an effective general solver. 

Last but not least, we address the RP selection issue. 
As shown in Section [Cj the performance of CLASS is 
improved by choosing a smaller value of A. Accordingly, 
one may propose implementing the noiseless LASSO, which 
theoretically guarantees the performance of the exact ML 
implementation. However, by introducing noise, some outliers 
will be introduced to the latter solution, which may be removed 
by thresholding. While unproven, we expect this thresholding 
strategy to have a higher threshold SNR, which brings the 
trade-off between the threshold and bias to account. 

Appendix A 
Uniqueness 

Here, we give a proof for uniqueness of the LASSO based 
estimator. We start by bounding the dimension of the global 
optimum as in Theorem [3] which is an interesting and useful 
result by itself. Next, we prove the uniqueness based on this 
bound. 

Lemma 1. For each matrix N, assume ||a H (f?)N||2 < A, 
where the steering vectors a(9) are of dimension m. Then, 
there exists at most m — 1 points 9i for which \\a H (#i)N||2 = 
A. 

Proof: Take 9 = ircos(9). Then, for a ULA we have 
a(9) = [1 e je e je e j28 . . . e J(™-l)*]T We may also gen- 
eralize this vector to the whole complex space as a(z) = 
[I z z 2 ... z rn - 1 ] T where z € C. Obviously, a(0) = a(e^). 
Define F{9) = \\a. H (0)N|||, T = NN fl , and T a = 
Y^, Tij for 0<a<2(m — 1). Furthermore, define 

j— i=ot — m+l 

2(m-l) 

the polynomial S{z) = T a z a — \z m ~ x . It is straight- 

forward to show that F{9) = JM- + A for z = e^ lB . Note that 
since F(z) — ^„[ z Ji + A is a complex differentiable function, 
from the Cauchy-Riemann equations we get ^ — — jj^f- 
if z = e~~i 6 . Suppose now that = [di Q% ...0 n ] is a 
global minimum of £Pf Then, due to the assumption, the 
function F{&) is always less than or equal to A and it has 
local maxima at 8 = 9i, with F(9i) = A. Then, F(zi) = and 
^{Zi) = -;^-§f (#) = 0, where z t = e~ j8i , with electrical 
angles 9{ corresponding to (9;. It can simply be concluded 
that S(zi) = S'(zi) = 0, which means that all numbers 
Zi are multiple roots of S(z). However, S(z) is of degree 
2(m — 1) and the number of multiple zeros can not be more 
than 2 iS^ll =m — l. ■ 

We also use the result in Theorem 3 in ll25l . Paper III. for 
the noiseless part. To prove the uniqueness, assume there exists 
two irreducible global minima (#i,Si) and (02, S2). Define 
9 = 61 U 6 2 and extend Si and S 2 on 6 to S 1 and S 2 as 
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Note that the pairs (0, S J ) are reducible global minima. Since 
^(X, 0, S) is a convex function of S, fixing A, we have 

#(x, e, (i - ^s 1 + /xs 2 ) < 

(1 - /i)*(X, 6/, S 1 ) + /x*(X, 0, S 2 ) = *(X, 6/, S 1 ) (32) 

for every < \i < 1. For such fi, this means that the point 
9, (1 — /i)S 1 + /iS 2 ) is also a global minimum. Note also 

1 1 2 are convex, thus 



that the two terms ^ and 



|(1 - /ijS 1 + M S 2 || lj2 = (1 - ^HS 1 !!^ + /i||S 2 || li2 (33) 



and 



(1 



i||X-A(0)((l- / i)S 1 
/i)|||X- A(6>)S 1 



i2 

If 



/4 



MS 2 ) 

llx- 



.2 

If 



A(0)S 2 ||1 (34) 



since, otherwise, ( f34l > and d33l with strict inequality results in 
strict inequality in d32l . Moreover, | |j«|j_p, is a strictly convex 
function and from (134-b we conclude that A(0)(Si — S2) = 0. 
But, this can be possible only if the dimension of 9 is more 
than m, since for n — m the matrix A(6>) is a Vandermonde 

11 gl 11 

type matrix and is full-rank. Choose fi such that j^jj < s ? ^ 
for every i Then the pair (0, (1 — njS 1 + /iS 2 ) is irreducible 
with n > to. This is a contradiction, since due to Corollary 1 
of Theorem [2] and Lemma Q] the dimension n of every irre- 
ducible global minimum 9 of &* t , is less than the dimension 
of the received vector m. 

Finally, we prove the uniqueness of ^t.ni- Note that the 
number of active DOAs for each global minimum of 3^t,n\ 
is also less than m — 1. To show that, we assume a solution 
(9, S) with n > m — 1. Take an arbitrary DOA 9 and note 
that since n > to — 1 the matrix A = [A(0) a(9)} has a 
nonempty null space. Then, we can take an (n+1) x T matrix 
M = [-Mi/] T such that AM = 0. Note that every element 
of each vector in jVa is nonzero, since every submatrix of A 
by removing columns is full rank. Thus, u can be made an 
arbitrary vector by scaling. From Theorem 3 in (25), Paper 
III., We have 

- A(0)M + a(0)i/ = -> Re(Tr(rg 1 SM H )) + \\u\\ 12 > 

' 05) 

Note also that, using Corollary 4 in 11251 . Paper HL, there 
exists a vector Z such that A(9) H Z = T^S. Combining this 
with ([33, we get Re(Tr(M ff A H Z)) + \\u\\ 2 > 0. Recalling 
that AM = sl{9)u we have Re{Tr{v H a H {9)Z)) + ||i/|| 2 > 
for every v. It is easy to verify that this is true if and only 
if ||a ff (6>)Z|| 2 < 1. Note that from the definition of Z, we 
have ||a H (6*i)Z||2 — 1. As the conditions of Lemma Q] are 
satisfied, we conclude that n < m. Like the previous part, 
the assumption of two global minima leads us to a third one 
as a convex combination of the two with n > m which is a 
contradiction. 

Appendix B 
Continuous solution path 

First, note that for each pair (X, A) and e > 0, there exists 
an a > such that if *(X, 0, S, A) < P(X, 0(X, A), A) + a 
and 9 is of bounded dimension, then for all 9 £ 9 either there 



exists a 9' £ 0(X, A) so that max{|0-0'|, ||S e -S e /||} < e, or 
||Sg/|| < e. Otherwise, using the BolzanoWeierstrass theorem, 
there exists a sequence 9k, converging to a global £P t point 
different from 9, S which is a contradiction. 

Next, note that due to Theorem [3] every 8P t solution 
dimension is bounded by to, and S can simply be confined to 
a sufficiently big but compact set e.g. ||S||i,2 < A. Over this 
set, as the pair (X',A') converge to (X, A), the continuous 
function *x',v(0, S) = *(X',0, S, A') uniformly converges 
to ^I'x,a(0,S). This means that there exists a 5 > such 
that if max{|| AX|j, || AA||} < 5, then for every proper pair 
(0,S) wehave|* x+ AX,A+AA(0,S)-*x,A(0,S)| < f . This 
leads to * x ,a(0(X + AX, A + AA), S(X + AX, A + AA)) < 
*x,a(0(X, A), S(X, A))+a. Using the results above the proof 
is complete. 

To extend the homotopy to A = 0, note that from the 
minimality of 0(X, Afe), we conclude 

' X-A(0(X,A))S(A)|%A||S(A)|| 1:2 < A||S nl || 1:2 (36) 

for all A > 0. Now, the continuity holds. Otherwise, there 
exists a sequence {Afc}^! of regularization parameters con- 
verging to zero, for each of which there exists 9 G 0(A) 
so that max{|6> - 6% ||S - S e /||} > e and ||S e || > e for 
every 9' E 9 n \. From d36l l and the uniqueness theorem, we 
conclude that the solutions are in a compact set of bounded 
dimension. Thus, there exists a subsequence of Aj, for which 
0(X, Afe) and S(A/-) converge to some and S respectively, 
different from the noiseless solutions. From d36l ). we get 

i X- A(0(X, A))S(A) < A||S n i||i i2 , and letting k tend 

F A 

to infinity, we conclude A(0)S = X. On the other hand, 
Afc||S(A fc )||i, 2 < AfeHSmlli^, and we get ||S|| lj2 < ||S„i||i, 2 . 
This is in contradiction with the uniqueness of the noiseless 
solution. 



Appendix C 
A. Computing the error covariance 

We start by A0 = A/3 + S. Note that since S is the first 
order ML error, its covariance can be found, e.g. in [27 1. 
We get Cov(A0) = ir _1 R _1 Re[D^CD )9 H T ]R _1 r _1 + 
Var(A)/3/3 T + k where k is proportional to £ (AAX). Now, 
we show that this expectation is zero and the effective terms 
are related to the second order statistics of A. 

We can write the best A value in 



A = min \ a\a > max \\el h (0)[Y + oA]|| 



(37) 



where Y is a linear (but not analytic) matrix function of AX 
and A is a constant one. Thus, Y is centered Gaussian but not 
circularly symmetric. We may show A as A(Y) or A(AX) = 
A(Y(AX)). 

The first observation is that A(oY) = |a|A(Y) for every 
complex a, which implies that A(aAX) = |a|A(AX). In 
other words, for every a > 0, {AX|A(AX) = a} = 
{aAX|A(AX) = 1} which means that £(AX|A = a) = 
<z£(AX|A = 1). We conclude that £(A)£(AX|A = 1) = 
£ A (£(AX|A)) = £(AX) = 0. Thus, £ (AX|A = 1) = and 
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£(AXA) = £\ (£(AAX|A)) = £(X 2 )£ (AX|A = 1) = 0. This 
shows that k oc £(AAX) = 0. 

B. The asymptotic extreme value expansion 

The variables Zk introduced in Section lTV-Dl are independent 
with chi squared distribution. Thus, 

1 f°° 

Pr(z 4 > a) = (r l) , J _ z T - 1 e- z dz (38) 

which is in the order of , " ,,,, e , . This can be seen , 
for example, by L'Hopital's rule. Since z^s are independent, it 
is simple to see that 

Pr(A^ < a) = (1 - Pr(zi > a)) m = (3 (39) 

Then, for large enough to, we get Pr(z^ > a) = ~ lnf3 which 

using (ED, can be written as (T - 1) In i^-™- ln(T - 

1)! = ln(— ln(/3)) — m. It can then be simplified to 

=lnm+(T-l)lnlnm + 7-ln(T-l)! + o(l) (40) 

a z to 

where 7 = — ln( — ln/3). We conclude ( f28l ). 
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